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Abstract: We address the traffic light control problem for multiple intersections in tandem by 
viewing it as a stochastic hybrid system and developing a Stochastic Flow Model (SFM) for it. 
Using Infinitesimal Perturbation Analysis (IPA), we derive on-line gradient estimates of a cost 
metric with respect to the controllable green and red cycle lengths. The IPA estimators obtained 
require counting traffic light switchings and estimating car flow rates only when specific events 
occur. The estimators are used to iteratively adjust light cycle lengths to improve performance 
and, in conjunction with a standard gradient-based algorithm, to obtain optimal values which 
adapt to changing traffic conditions. Simulation results are included to illustrate the approach. 

Keywords: Traffic Light Control, SFM, IPA. 



1. INTRODUCTION 

The Traffic Light Control (TLC) problem aims at dy- 
namically controlling the flow of traffic at an intersection 
through the timing of green/red light cycles with the 
objective of reducing congestion, hence also the delays 
incurred by drivers. The more general problem involves 
a set of intersections and traffic lights with the objective 
of reducing overall congestion over an area covering mul- 
tiple urban blocks. Since the control of one intersection 
influences the traffic flow from or towards others, this is 
a complex problem further complicated by the fact that 
traffic flows constantly change depending on the time of 
day, accidents, weather conditions, etc. Recent technologi- 
cal developments involving better, inexpensive sensors and 
wireless sensor networks have enabled the collection of 
data (e.g., counting vehicles in a specific road section) 
which can be used to optimally select traffic light cycles 
over specific time intervals in a day or even to dynamically 
control them based on real-time data. Thus, methodologies 
that would not be possible to implement not long ago 
are now becoming feasible. The approach proposed in 
this paper to the TLC problem is specifically intended to 
exploit these recent developments. 

Several different approaches have been proposed to solve 
the TLC problem. It is formulated Mixed Inte- 

ger Linear Programming (MILP) problem in Dujardin 
et al. [2011], and as an Extended Linear Complementary 
Problem (ELCP) in DeSchutter [1999]. A Markov Deci- 
sion Process (MDP) approach has been proposed in Yu 
and Recker [2006] and Reinforcement Learning (RL) was 
used in Thorpe [1997], with several extensions found in 
Prashanth and Bhatnagar [2011], Wiering et al. [2004]. A 
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game theoretic viewpoint is given in Alvarez and Poznyak 
[2010], while a hybrid system formulation is presented in 
Zhao and Chen [2003] . Due to its complexity when viewed 
as an optimization problem, fuzzy logic is often used in 
both a single (isolated) junction Murat and Gedizlioglu 
[2005] and multiple junctions Choi et al. [2002]. Expert 
systems Findler and Stapp [1992] and evolutionary algo- 
rithms Taale et al. [1998] have also been applied to develop 
a traffic light controller for a single intersection. Perturba- 
tion analysis techniques were used in Head et al. [1996] 
and a formal approach using Infinitesimal Perturbation 
Analysis (IPA) to solve the TLC problem was presented 
in Panayiotou et al. [2005] for a single intersection. 

In Geng and Cassandras [2012], we study the TLC problem 
for a single intersection using a Stochastic Flow Model 
(SFM) and Infinitesimal Perturbation Analysis (IPA). In 
this paper, we extend our analysis to two tandem intersec- 
tions. We still adopt a stochastic hybrid system modeling 
framework (see Cassandras and Lafortunc [2008], Cassan- 
dras and Lygeros [2006]), since the problem involves both 
event-driven dynamics in the switching of traffic lights 
and time-driven dynamics that capture the flow of vehicles 
through an intersection. Although one can also view this 
as a purely Discrete Event System (DES) with the inter- 
section area as a "server" processing "users" (vehicles), 
the fact that a vehicle does not exclusively occupy this 
area makes a flow-based viewpoint a more accurate way 
to model such a process. While in most traditional flow 
models the flow rates involved are treated as deterministic 
parameters, a SFM as introduced in Cassandras et al. 
[2002] treats them as stochastic processes. In the TLC 
problem, this is consistent with continuously and randomly 
varying traffic flows, especially in heavy traffic conditions 
where the problem is most interesting. With only minor 
technical assumptions imposed on the properties of such 
processes, a general IPA theory for stochastic hybrid sys- 
tems was recently presented in Wardi et al. [2010],Cassan- 



dras et al. [2010] through which one can estimate on line 
gradients of certain performance measures with respect 
to various controllable parameters. These estimates may 
be incorporated in standard gradient-based algorithms to 
optimize system parameter settings. IPA was originally 
developed as a technique for evaluating gradients of sample 
performance functions in queueing systems and using them 
as unbiased gradient estimates of performance metrics. 
However, IPA estimates become biased when dealing with 
aspects of queueing systems such as multiple user classes, 
blocking due to limited resource capacities, and various 
forms of feedback control. The use of IPA in stochastic 
hybrid systems, however, circumvents these limitations 
and yields simple unbiased gradient estimates (under mild 
technical conditions) of useful metrics even in the presence 
of blocking and a variety of feedback control mechanisms 
(see Yao and Cassandras [2011a].) 

In Section 2, we formulate the TLC problem for two 
intersections and construct a SFM. In Section 3, we derive 
an IPA estimator for a cost function gradient with respect 
to a controllable parameter vector defined by green and red 
cycle lengths. This is then used to iteratively adjust these 
cycle lengths to improve performance and, under proper 
conditions, obtain optimal parameter values. Simulation- 
based examples are given in Section 4 and we conclude 
with Section 5. 

2. PROBLEM FORMULATION 

In this paper, we concentrate on solving the TLC problem 
for two coupled intersections, as shown in Fig. 1. There 
are four roads and four traffic lights, with each traffic light 
controlling the associated incoming traffic flow. The traffic 
in road 1 of intersection I\ flows into road 3 of 1%. For 
simplicity, we make the following assumptions: (i) Left- 
turn and right-turn traffic flows are not considered, i.e., 
traffic lights only control vehicles going straight, (ii) A 
YELLOW light is combined with a RED light (therefore, 
the YELLOW light duration is not explicitly controlled). 
(Hi) Road 3 is long enough that cars accumulated in it do 
not influence the departure process of road 1. 
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Fig. 1. Two tandem traffic intersections 

The system involves a number of stochastic processes 
which are all defined on a common probability space 
(ft, F, P). Each of the four roads is considered as a queue 
with a random arrival flow process {a n (£)}, n = 1, . . . , 4., 
where a n (t) is the instantaneous vehicle arrival rate at 
time t. When the traffic light corresponding to road n 
is GREEN, the departure flow process is denoted by 



{/3 n (t)},n = 1,...,4. Let the GREEN light duration in 
a cycle of queue n be 9 n , and the controllable parameter 
vector of interest is 9 = [9\ , . . . , 84]. We define a state 
vector x(6,t) = [xx(9, t), x±(9, t)] where x n (9,t) € M + 
is the content of queue n = 1,...,4. We use the notation 
x n (9, t) to emphasize the dependence of the queue content 
on 9; however, for notational simplicity, we will write x n (t) 
when no confusion arises. We also define a left-continuous 
"clock" state variable z n (t), n = 1,...,4, associated with 
the GREEN light cycle for queue n as follows: 

. , , _ J 1 if < z n (t) < 9 n or Zn(t) = 9. n n v 
- I otherwise [ ' 

z n (t+) = if z n (t) = 6 n 

where n is the index of the road perpendicular to road n 
at the same intersection. We set z(t) = [zi(t), 24 (i)]. 
Thus, z n (t) measures the time since the last switch from 
RED to GREEN of the traffic light for queue n. It is reset 
to as soon as the GREEN cycle length 9 n is reached 
and remains at this value while the light is GREEN for 
queue n; as soon as that cycle ends, i.e., Zn(t) = On, then 
z n (t) = 1 and the process repeats. 

To simplify notation, we set B n (z,9) = 1 if the Boolean 
expression used in (1), i.e., [0 < z n (t) < 9 n or z n (t) = n ], 
is true (light is GREEN) and B n (z,6) = otherwise. We 
can now write the dynamics of each state variable x n (t) as 
follows: 

(a n (t) HB n (z,8)=Q 
x n (t) = \ if x n (t) = and a„(t) < /3 n (t) 

I ctn (t) — Pn (i) otherwise 

(2) 

where 

r h„(t) if B n (z,0) = 1 and x n {t) > 
Pn(t) = \ a n (t) if B n (z, 9) = 1 and xjt) = (3) 
{ otherwise 

In (3), h n (t) describes the departure process if the road 
is not empty. According to assumption (Hi), Pi(t) is 
independent of X3 (t). However, as(t) depends on the 
departure process fii(t) of queue 1; in particular, 

a 3 (t)=A(*) (4) 
Combining (2) through (4), we have the dynamics of queue 
3: 



hi(t) 


i£B 3 (z,8) = Q, B t (z,8) = 


1, 




and x\(t) > 


(5.1) 


ai(t) 


if B s (z,6) = 0, B x (z,9) = 


1, 




and x\(t) = 


(5.2) 





MB 3 (z,9) = Q, Bt(z,9) = 


0, 




or B 3 (z,9) = 1, x 3 (t) =0 


(5.3) 


hi(t)- 


h 3 (t) if B 3 (z,0) = 1, Bi(z,'0) - 


1, 




and x 3 (t) > 0, xi(t) > 


(5.4) 


ai(t) - 


h 3 (t) HB 3 (z,6) = 1, B x (z,9) = 


1, 




and x 3 (t) > 0, xi(t) = 


(5.5) 


-h 3 (t) 


i£B 3 (z,9) - 1, B t (z,9) = 


0, 




and x 3 (t) > 


(5.6) 



(5) 

The operation of the intersection can be viewed as a hybrid 
system with the time-driven dynamics described by (2)- (5) 
and event-driven dynamics dictated by GREEN- RED light 
switches and by events causing some x n (t) to switch from 
positive to zero or vice versa. 



Fig. 4. Three ways for starting a NEP 



Fig. 2. The Stochastic Hybrid Automaton model 

Using the standard definition of a Stochastic Hybrid 
Automaton (SHA) (e.g., see Cassandras and Lafortune 
[2008]), we may obtain a SHA model for queue 1,2 and 
4 which is similar to Geng and Cassandras [2012]. Here, 
we concentrate on the SHA for the operation of queue 3 as 
shown in Fig. 2. This reflects the fact that a typical sample 
path of any one of the queue contents (as shown in Fig. 3) 
consists of intervals over which x n (t) > 0, which we call 
Non-Empty Periods (NEPs), followed by intervals where 
x n {t) = 0, which we call Empty Periods (EPs). Thus, the 
entire sample path consists of a series of alternating NEPs 
and EPs. The event set that affects any queue n = 1,2,4 
is $„ = {ei, e2, e3, e^, e$} where e\ is a switch in the sign 
of a n (t) — /3 n (t) from non-positive to strictly positive, e2 is 
a switch in the sign of a n (t) from to strictly positive, e% 
is the queue content becoming empty, i.e., Xi = 0, which 
terminates a NEP (and initiates an EP), e 4 switches a 
light from RED to GREEN, and e$ switches a light from 
GREEN to RED. For easier reference, we label e 3 as "E n " 
for the end of NEP events, e 4 as "i?2G„" and e 5 as "G2i?„" 
for the light switching events. The resulting start of a NEP 
is an event "induced" by either e$ or e2 or e\ which we 
will refer to as an "S n " event. For queue 3, the event set 
includes all those events that cause a jump in the value 
of Xs(t) in (5). As we can see from Fig. 2, every event of 
<&i also affects the dynamics of queue 3. Thus, we have 
$3 = {S\,Ex,R2Gi, G2R\, S3, E 3 ,R2Gz, G2i? 3 }. 

Returning to Fig. 3, the mth NEP in a sample path of 
queue 3, m = 1, 2, . . ., is denoted by [£ 3 , m , 773, m), i-e., £3,™, 
V3,m are the occurrence times of the mth 5*3 and E3 event 
respectively at this queue. During the mth NEP, t J 3 m , 
.7 = 1,..., J3,m, denotes the time when an event occurs. 
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Fig. 3. A typical sample path of traffic light queue 3 



Our objective is to select 9 so as to minimize a cost 
function that measures a weighted mean of the queue 
lengths over a fixed time interval [0, T]. In particular, we 
define the sample function 

L{6- x(0), 2(0), T) = ivf w n x n (9, t)dt (6) 

1 n=l J 

where w n is a cost weight associated with queue n and 
x(0),z(0) are given initial conditions. It is obvious that 
since x n (t) = during EPs of queue n, we can rewrite (6) 
in the form 

1 4 M n l-Vn m 

L(9;x(0),z(0),T) = -Y,Yl / w n x n (9,t)dt (7) 

n=\ ra=\ •'£»■"> 

where M n is the total number of NEPs during the sample 
path of queue n. For convenience, we also define 

L n , m (0) = f ' x n {9,t)dt (8) 

to be the sample cost associated with the mth NEP of 
queue n. We can now define our overall performance metric 
as 

J(9; x(0), z(0), T)=E [L(9; x(0),z(0),T] (9) 
Since we do not impose any limitations on the processes 
{a n (t)} and {(3 n (t)}, it is infeasible to obtain a closed- form 
expression of J(9;x(0),z(0),T). The only assumption we 
make is that a n (t), f3 n {t) are piecewise continuous w.p. 1. 
The value of IPA as developed for general stochastic hybrid 
systems in Cassandras et al. [2010] is in providing the 
means to estimate the performance metric gradient V J (9), 
by evaluating the sample gradient VL(9). As shown else- 
where (e.g., see Cassandras et al. [2010]), these estimates 
are unbiased under mild technical conditions. Moreover, 
an important property of IPA estimates is that they are 
often independent of the unknown processes {a n (t)} and 
{/3 n (t)} or they depend on values of a n (t) or (3 n (t) at 
specific event times only. Such robustness properties of 
IPA (formally established in Yao and Cassandras [2011b]) 
make it attractive for estimating on line performance sen- 
sitivities with respect to controllable parameters such as 
9 in our case. One can then use this information to either 
improve performance or, under appropriate conditions, 
solve an optimization problem and determine an optimal 
9* through an iterative scheme: 

9 l . k+ i = 9i,k ~ lkHi, k {9 k , x{0),T, k = 0, 1, ... (10) 
where H i j i .(9k-,x(0) J T,uji : ) is an estimate of dJ/d9i based 
on the information obtained from the sample path denoted 
by Wfc, and 7& is the stepsize at the fcth iteration. Next 



we will focus on how to obtain dL/d9i, i = 1,2,3,4. We 
may then also obtain 9* through (10), provided that the 
random processes {a n (t)} and {/?„(<)} are stationary over 
[0, T]. We will assume that the derivatives dL/d9i, exist for 
all 8i e R + w.p. 1 (if this is violated, IPA is still possible 
by considering one-sided derivatives; see Cassandras et al. 
[2002].) 

3. INFINITESIMAL PERTURBATION ANALYSIS 
(IPA) 

Consider a sample path of the system as modeled in Fig. 2 
over [0,T] and let Tk(9) denote the occurrence time of the 
kth event (of any type), where we stress its dependence 
on 9. To simplify notation, we define the derivatives of 
the states x n (t,9) and z n {t,6) and event times T k (9) with 
respect to 8 i} i — 1, . . . , 4, as follows: 



<i(*) 



dx n (9,t) 
89, 



, z' n At,6) 



dz n (9,t) , 



dr k {9) 



38, 



Taking derivatives with respect to 9i in (7), we obtain 

4 M„ 



(11) 



dL{9) 
d8i 



EE 

n—l m—1 



sr 

~d~9~ 



w n x ' n At)dt 



d6i 



Since, at the start and end of a NEP x n (£„ im ) 
x n (Vn,m) = 0, this reduces to 

dL(9) _ 1_ * 5^ 



d8i 



4 M n 

EE 



dL, 



w n x' nii (t)dt (12) 



d9 t 



n—l m—1 

where the last equality follows from the definition (8). 

By assumption (Hi), /3i(i) is independent of Xz(t). There- 
fore, dL n ^ m /d9i = for n = 1,2 and i = 3,4. It follows 
that x' n ^t) for n = 1,2 and i — 1,2 can be obtained by 
the analysis of a single isolated intersection in Ceng and 
Cassandras [2012]. Since equation (2) can still be applied 
for Xi(t), we can obtain x' 4i (t), i = 1,...,4, similar to 
a queue in an isolated intersection. Therefore, in what 
follows, we focus on obtaining x' 3i (t) and hence dL 3y7n /d9i. 

3.1 State Derivatives 

Observe that the determination of the sample derivatives 
in (12) depends on the state derivatives x' ni (t). The 
purpose of IPA is to evaluate these derivatives as functions 
of observable sample path quantities. We pursue this next, 
using the framework established in Cassandras et al. [2010] 
where, for arbitrary stochastic hybrid systems, it is shown 
that the state and event time derivatives in (11) can be 
obtained from three fundamental "IPA equations" . For the 
sake of self-sufficiency, these equations are rederived here 
as they pertain to our specific SFM. Looking at (2) and 
Fig. 2, note that the dynamics of x n (t) are fixed over any 
interevent interval [T k ,T k+ i) and we write x n (t) — f n , k (t) 
to represent the appropriate expression on the right-hand- 
side of (2) over this interval. We have 



x n {t) = x n (r k ) + / ,f n .k{ T ) dl 

J Th 



and taking derivatives with respect to 9i, we get 
dx n {r~) _, 



<i(*) 
+ 



E nA T k ) + 



dt ' M 

dfn,k{r) 



d9i 

dx n (r~) 



(13) 

dT ~ fn,k( T k)T' kti 

= fn,k-i(rj7) and 



Letting t = t£ and since — t^- 
= = from (2), we obtain 

<M) = X 'nM) + \fn,k-l{Tk) - UM)Vk,i ( 14 ) 

Moreover, taking derivatives with respect to t in (13), we 
get, for all t e [r fe ,r fe+ i), 



dfn.k 



and we get 4fx' ni (t) 



(15) 



0. 



Therefore, x ni (t) remains constant over all t G [Tfc,T fc+1 ): 

<<(*)= <i(^), te[n,T k+1 ) (16) 

Thus, focusing on a NEP of x n (t), the queue content 
derivative is piecewise constant with jumps occurring 
according to (14). The final step is to obtain the event 
time derivatives r' k i appearing in (14), which we do next. 

3.2 Event Time Derivatives 

Clearly r' k i depends on the type of event occurring at time 
Tfc. Following the framework in Cassandras et al. [2010], 
there are three types of events for a general stochastic 
hybrid system. For the purpose of these definitions, let 
the continuous state component of the hybrid system be 
x e X C R N , x = [x u . ..x N ], and let 9 € Q C R M . 

(1) Exogenous Events. An event is exogenous if it 
causes a discrete state transition at time r k indepen- 
dent of the controllable parameter 9. Thus, it satisfies 

T 'k,i = (17) 

(2) Endogenous Events. An event is occurring at time 
r k is endogenous if there exists a continuously diffcr- 
cntiablc function g k : R N x6^1 such that 

r M = min{t > r fe _! : g k (x{8,t),9) = 0} (18) 

where the function g k normally corresponds to a 
guard condition in a hybrid automaton. Taking 
derivatives with respect to 9 i7 i = 1, ...,m, it is 
straightforward to obtain 

i( T k) 



9g k 



2^7 = 1 



dxj X j,t^ 



J2j=l d% fj,k-l(T k ) 



(19) 



(3) Induced Events. An event at time r k is induced if it 
is triggered by the occurrence of another event at time 
T m < T k . In this case, r' k depends on the derivative 
T' m (details can be found in Cassandras et al. [2010].) 

In the following, we consider each of the event types at 
queue 3 that were identified in the previous section and 
derive the corresponding event time derivatives. Based on 
these, we can then also derive the state derivatives through 
(14) and (16). 



(1) Event E\ ends a NEP of queue 1. This is an endoge- 
nous event that occurs when x\(6,t) — 0. Thus, when 
such an event occurs at r fc , let gk(x(9, t),6) = X\{6, t) = 0. 

— x' (r~ ) 

Using (19), we get tL i = - — 1,1 fc . Looking at (5), we 

Jl,fc — 1 \T k ) 

have either f 3 ,k-i(T k ) = ftifaT) - /i 3 fa) and / 3 ,fcfa) = 

a lfa) _ h 3( T k) when B 3( z ,°) = !. or /3,fc-lfa) = 

/iifa) and f3,k{ T k) = a ifa) wnen B 3 (z,9) = 0. In both 
cases, / 3 ,fc-ifar) - / 3 ,fcfa~) = M T /7) ~ a ifa)- Usin S 
these values in (14) along with t£ f above we get 



^.ifa") — x 3.i( T k ) 



[^lfa )- a l( T fe )K,ifa ) 



a„fa ) - /iifa ) 
= <ifa-) + <ifa~). i = l,...,4 (20) 
As we can see, £3 j(Tjj~) explicitly depends on j(r^"). 

(2) Event E 3 ends a NEP of queue 3. This is an endoge- 
nous event that occurs when x 3 (9, i) — 0. Thus, when 
such an event occurs at r/j, let g k {x{9, t),9) = x 3 (9, t) = 0. 



Using (19), we get r' k 

have f 3 , k {T k ) = ( 
r ki above we get 



" 7^7^j- According to (5), we 
have f 3 , k (r k ) — 0. Using these values in (14) along with 



l (r k + )=4A T k)-(hk-i(r-)-0)- 



i,i( T k 



= 



h,k-i{T k ) 

Thus, at the end of a NEP [£ 3im , f?3, m ) of queue 3 we have 

< i «J=0, i = l 4 (21) 

indicating that these state derivatives are always reset to 
upon ending a NEP. 

(3) Event G2R\, i.e., the GREEN light of queue 1 switches 
to RED. This is an endogenous event that occurs when 
g k (x(9,t),9) = zi(r k ) = 9\. r' ki is determined by the 
following lemma. 

Lemma 1. Let £i,fc be the total number of G2Ri events 
that have occurred before or at r k , and p\^ be the total 
number of R2G\ events that have occurred before or at 
T fe . Then, r' k l = Ci.fe, r' k2 = p hk , r' k z = and T ki = 

The proof of this lemma can be found in Geng and Cassan- 
dras [2012]. According to (5), we have either f 3 ,k-i{T k ) — 

h.k{r+) = /n(Tfc ) (from (5.1)-(5.3), or (5.4)-(5.6)), or 
hk-i(r k )-h.k{r+) = ai (r k ) (from (5.2)-(5.3), or (5.5)- 
(5.6)). From (3), we can combine these two situations and 
simply so that f 3 ,k-i(j k ) - h,k( T k) = PA T k~)- According 
to (14), we get 



x aA T k) 



x sA T k ) +/ 3 ifa )Ci,fe i = 1 



(22) 



ifa") 



3,4 



(4) Event G2R 3 , i.e., the GREEN light of queue 3 switches 
to RED. This is an endogenous event that occurs when 
g k (x(9,t),9) = z 3 (r k ) = 9 3 . r' ki is determined by the 
following lemma. 

Lemma 2. Let ( 3j k be the total number of G2_R 3 events 
that have occurred before or at r k , and p 3tk be the total 
number of -R2G3 events that have occurred before or at 
r fe . Then, r' k3 = ( 3tk , r' k i = p 4 ,k, r' kl = and r' k2 = 



From (5), if x 3 (r k ) > 0, we have f 3 ,k-i{r k ) - f3,k{ T k) = 
-h 3 (r k ) (from (5.4)-(5.1), or (5.5)-(5.2), or (5.6)-(5.3)). 
According to (14), the state derivative is 

x 3,i( T k) i = l,2 



b 3,i\ 



4) 



i( T k ) - ^fa )C3,fe i 
. x 3,i( T k) ~ M T k~)P3,k i 



(23) 



If x 3 {r k ) = 0, / 3>fc -i(r fc -) - / 3lfc (r+) = -ft(r+) (from 
(5.3)-(5.1), or (5.3)-(5.2)). Then, 



x 3A T i) 



b 3,i 



1,2 



x 3,i( T k ) ~ ^i( T fe )C3,fc i = 3 
x 3,ii T k) ~ Pl( T k)P3,k i = 4 



(24) 



(5) Event R2G\, i.e., the RED light of queue 1 switches 
to GREEN. This is an endogenous event that occurs 
when g k (x(9,t),9) = zi{jk) = Qi- T ki is determined by 
Lemma 1. Similar to the analysis of a G2R\ event, we 
have / 3 ,fc-i(r fc -) - / 3>fe (r+) = -/?i(r+) (from (5.3)-(5.1), 
or (5.6)-(5.4), or (5.3)-(5.2), or (5.6)-(5.5)). Thus the state 
derivative is 



x 3,i( T k) 



x 3A T k )-Pi(r£)(i,k % = 1 
x 3A T k) - Pi( T k)Phk i = 2 



(25) 



x sA T k ) 



3,4 



(6)Event R2G 3 , i.e., the RED light of queue 3 switches 
to GREEN. This is an endogenous event that occurs when 
g k (x(9,t),9) = z^{T k ) = 94. r' ki is determined by Lemma 

2. From (5), we have f 3 ,k-i(T k ) - f3,k{ T k) = M T ^) 
(from (5.1)-(5.4), or (5.2)-(5.5), or (5.3)-(5.6)). The state 
derivative is 

' x ' 3 A T k) * = i,2 



i( T k ) = \ x iA T k ) + h 3 {T+)( 3 , k i 



(26) 



''3,1 



fa ) + h 3 (T+)p 3M « = 4 



(7) Event S\ starts a NEP of queue 1 As already men- 
tioned, this is an event induced by a G2R\ event (es) or a 
switch of a\(t) from zero to a strictly positive value (e 2 ) 
occurring during a RED cycle, or a switch of at\{t) — (i\{t) 
from a non-positive to a strictly positive value (ei) occur- 
ring during a GREEN cycle (see Fig. 4). Consequently, 
there are three possible cases to consider as follows. 

Case (7a): A NEP of queue 1 starts right after a G2R\ 
event. This is an endogenous event and was analyzed in 
Case (3). Since Xx{r k ) = 0, / 3 ,fc-ifa") - ikfcfa) = 
a-i_{T k ) (from (5.2)-(5.3) or from (5.5)-(5.6) in (5)). We 
get 

! x 3A T k) + aifa)Ci,fe i = 1 
4,ifa~) + a ifa~Kfc 1 = 2 ( 2? ) 
x 3A T k) * = 3 - 4 

Case (7b): A NEP of queue 1 starts while z 1 (r / t) = 0, 
Z2 fa) > 0. This is an exogenous event occurring during a 
RED cycle for queue 1 and is due to a change in ai(r k ) 
from a zero to a strictly positive value. Therefore, r' k i = 0. 
We then have 

x' 3 A^) = x 3 , l (r-),t = 1,2,3,4 (28) 

Case (7c): A NEP of queue 1 starts while z 2 (T k ) = 0, 
zifa) > 0. This is an exogenous event occurring during 



a GREEN cycle for queue 1 due to a change in ai(rfe) 
or /3i(Yfc) that results in ai(T k ) — Pi(T k ) switching from 
a non-positive to a strictly positive value. The analysis is 
exactly the same as Case (7b) above and (28) applies. 

(8) Event S 3 starts a NEP of queue 3. This is similar 
to Case (7), and there are also three possible cases to 
consider. 

Case (8a): A NEP of queue 3 starts right after a G2R 3 
event. This is an endogenous event and was analyzed 
in Case (4). Since x 3 (t^) — 0, we have f 3 .k-i{ T k) — 
fz t k{ T k) = — fii( T k)' and (24) applies. Suppose that this 
is the mth NEP, i.e., Tk — ^3, m - We have already shown in 
(21) that x' 3 i(T]nm-i) = 0- I n addition, we have x 3 (t) = 
over the interval [773,^1-1, £3,m), thus x' 5i (t) — for all 
t e [r/3,m-i,6,m) and we get x' 3 ^{t^) = x' 3>i (^) = , the 
state derivative in (24) becomes 

C i = 1,2 

x' 3 ,i(T+) = {-l3 1 (T+)&, k i = 3 (29) 

Case (8b): .A AiJP 0/ queue 3 starts while z 3 (t^) = 0, 
za(t].) > 0. This is due to a change in a 3 {r k ) from a zero to 
a strictly positive value. It also happens in two ways. First, 
o^Tfe) = Pi{ T k) becomes positive because a G2R\ event 
occurs. Then (22) applies, where x 3i (rr) = 0. Second, 
/3i(Tfc) becomes positive because either /ii(rfe) or ai(rfe) 
switches from to a strictly positive value, which is an 
exogenous event. Therefore, the state derivative is 

x 3 At+)=x^(t-) = 0, 1 = 1,2,3,4 (30) 

Case (8c): A NEP of queue 3 starts while z^(t^) = 0, 
zsirk) > 0. This is due to a change in a 3 (T k ) — (3 3 (rk) 
from a zero to a strictly positive value, which may happen 
in two ways. First, it becomes positive because a G2Ri 
event occurs, which makes a 3 (t) larger. Then (22) applies, 
where x 3i (r k ~) — 0. Second, it is due to a change of value in 
either hi(r k ) or ai(r fe ) or (3 3 (T k ), which are all exogenous 
events. The state derivative is the same as in (30). 

This completes the derivation of all state and event time 
derivatives required to evaluate the sample performance 
derivative in (12). Using the definition of L n ^ m (ff) in (8), 
note that we can decompose (12) into its NEPs and 
evaluate the derivatives dL n ^ m {9)j d9i as shown next. 

3.3 Cost Derivatives 

By virtue of (16), x' n i (t) is piecewise constant during a 
NEP and its value changes only at an event point P nm , 
j = 1, J nm . Therefore, we have 

— x n,i((Cn,m) + )(t nm — £ n ,m) + x ni ( (t n "^ ) + ) 

(Vn.m ~ t-n^m 1 ) + ^ ' X n,i((^n,m) )(^n,m ~ ^n,m 

(31) 

Clearly, the state derivative at each event point is deter- 
mined by (14) which in turn depends on the event type 



at P nm , j — 1, J n ,m and is given by the corresponding 
expression in (20) through (30). An explicit closed- form 
expression of dL nym (9) / d9i may be obtained in this man- 
ner but becomes complicated. A simple algorithm that 
updates dLn ^ 8 '> after every observed event is simple to 
implement. More importantly, note that this IPA deriva- 
tive depends on: (i) the number of events in each NEP 
Jn.nn {H) the number of total G2R n events Q n .k, {Hi) the 
number of total R2G n events p n ,k, (iv) the event times 
£n,m, Vn.m and t J nm , and (v) the arrival and departure 
rates a n (rk), p n {Tk) at an event time only. The quantities 
in (i) — (iv) are easily observed through counters and 
timers. The rates in (v) may be obtained through simple 
estimators, emphasizing that they are only needed at a 
finite number of observed event times. 

4. SIMULATION RESULTS 

We describe how the IPA estimator derived for the SFM 
can be used to determine optimal light cycles for two inter- 
sections modeled as a DES. We apply the IPA estimator 
using actual data from an observed sample path of this 
DES (in this case, by simulating as a pure DES). 

We assume cars arrive according to a Poisson process 
with rate a n , n = 1,2,4 (as already emphasized, our 
results are independent of this distribution.). We also 
assume cars depart at a rate h n (t) which we fix to be a 
constant H n when the road is not empty. We also constrain 
6i, i = 1, 4, to take values in [0 min , #max]- 

For the simulated DES model, we use a brute-force (BF) 
method to find an optimal 0* BF : we discretize all real values 
of 9i and for 0i, i = 1, 4 combinations we run 10 sample 
paths to obtain the average total cost. The value of 6* BF is 
the one generating the least average cost, to be compared 
to 0*i pa, the IPA-based method. In our simulations, we 
estimate a n (T k ) through N a /t w by counting car arrivals 
N a over a time window t w before or after £„ jTn ; (3 n { T k) is 
similarly estimated. 

In the results reported here, we set a n — 1/4, n — 1,2,4, 
H n = 1, n = 1, 2, 3, 4, # m i n = Ibsec, 9 max = AOsec and the 
sample length T = lOOOsec. Fig. 5 shows the trajectories of 
J and 9 using the IPA-based method where w = [10, 1,1,1] 
and initial 9q = [25, 30, 30, 25] . More results are shown 
in Table 1. As we can see, 9* IPA is approaching the 
optimal value obtained by the BF method. Notice that 
BF method becomes impractical when there are more 
controlling parameters, or when the range of the parameter 
is large. However, the IPA method is still effective in such 
situations. Moreover, we notice that the value of (9 3 + 
9 4) is similar to {6\ + 9 2 ). This indicates that the two 
intersections tend to have the sample traffic light switching 
cycle to balance traffic flows. 

Table 1. IPA vs BF method result I 



w 


BF 


IPA 


e* 


J* 


6* 


J* 


[1,1,1,1] 


[15,15,15,15] 


5.4 


[15,15,15,15] 


5.4 


[10,1,1,1] 


[27,15,15,29] 


16.6 


[28.8,15,15,27.8] 


17.5 


[1,5,5,1] 


[15,23,17,21] 


12.6 


[15.1,18.6,15.6,18.5] 


13.2 


[5,1,1,10] 


[25,15,15,25] 


22.0 


[22.1,15,15,22.9] 


22.5 


[1,10,1,1] 


[15,29,15,29] 


16.3 


[15, 31.2,18.1,26.6] 


17.2 





rate. 63 also increases because more cars flow into queue 
3. 



5 10 15 20 25 " 30 



Fig. 5. IPA-based TLC result I 

Based on this observation, we also do simulations by 
setting 61+62 = T\ and 63 + 64 = T2, which indicates 
that we set the "GREEN plus RED" cycle to be fixed 
Ti for each intersection. With this constraint, we only 
need to find optimal 0* and #3, since 6\ — T\ — 6\ and 
6% = T 2 - 6\. We first let T x = T 2 , which restricts the 
two intersections to have the same traffic light switching 
cycle. Table 2 shows the simulation results. Ti and T2 are 
set to be the value obtained from Table 1. For example, 
when w = [10,1,1,1], 61+62 = 42 and 63 + 6 4 = 44 
in Tabic 1. We then set T x = T 2 = 44 in Table 2, and 
restrict 6 min = 15. Comparing the results in Table 2 with 
Table 1, we find that it supports the results where we allow 
independent 8i, i = 1, 2, 4. 

In Fig. 6, we set w = [1, 10, 1, 1] and change T 2 to obtain 
JjpA while keeping T x — 44. As we can see, the minimum 
J ip A i s achieved when T x = T 2 , which also matches the 
observations under independent 8i. 




Fig. 6. Jj PA at different T 2 



Table 2. IPA vs BF method result II 



w 


[Ti,T 2 ] 


BF 


IPA 




J* 


w,m 


J* 


[1,1,1,1] 


[30,30] 


[15,15] 


5.4 


[15,15] 


5.4 


[10,1,1,1] 


[44,44] 


[31,20] 


16.2 


[30.4,21.6] 


17.6 


[1,5,5,1] 


[39,39] 


[15,16] 


12.2 


[15,15.6] 


14.1 


[5,1,1,10] 


[40,40] 


[25,15] 


24.3 


[23.1.6,12.5] 


24.3 


[1,10,1,1] 


[44,44] 


[15,16] 


17.5 


[15,15.2] 


17.6 



We are also interested to see the optimal control pa- 
rameters under different traffic intensities. We set w = 
[1,10,1,1], Ti = T 2 = 44 and a = [1/r, 1/4, 1/4], i.e., we 
operate under different arrival rate of queue 1. Fig. 7 shows 
the optimal cost and optimal 81 and 63 while r varies. It is 
clear to see that 6\ increases as r decreases. This indicates 
more GREEN light duration is assigned to queue 1 as more 
cars are accumulated in queue 1 because of the fast arrival 
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Fig. 7. J*i Pj \ and 6* IPA at different r 

It must be pointed out that the BF method does not 
provide a "true" optimal, since the DES model of the 
traffic system is as much an approximation as the SFM 
based on which IPA operates. Thus, the comparative 
results should be interpreted accordingly. 

5. CONCLUSIONS AND FUTURE WORK 

We have developed a SFM for a traffic light control prob- 
lem with two coupledintersections, based on which we 
derive an IPA gradient estimator of a cost metric with 
respect to the controllable green and red cycle lengths. 
The estimators are used to iteratively adjust light cycle 
lengths to improve performance and, under proper con- 
ditions, obtain optimal values which adapt to changing 
traffic conditions. The analysis in the paper can be readily 
extended to N intersections in tandem. Future work will 
extend our method to solving the TLC problem over mul- 
tiple junctions without assumption (Hi), i.e., allowing a 
finite car capacity between intersections to cause blocking 
effects. 
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